Overview

Dataset statistics

Number of variables40
Number of observations204
Missing cells531
Missing cells (%)6.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory303.2 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2011" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
DEXAME has a high cardinality: 130 distinct values High cardinality
DTRATA has a high cardinality: 82 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
COUFINF is highly correlated with RESULT and 5 other fieldsHigh correlation
PMM is highly correlated with CS_ESCOL_N and 3 other fieldsHigh correlation
CS_RACA is highly correlated with CS_ESCOL_NHigh correlation
RESULT is highly correlated with COUFINF and 15 other fieldsHigh correlation
AT_SINTOMA is highly correlated with RESULT and 4 other fieldsHigh correlation
ID_UNIDADE is highly correlated with LOC_INFHigh correlation
ID_REGIONA is highly correlated with RESULT and 7 other fieldsHigh correlation
SG_UF_NOT is highly correlated with RESULT and 7 other fieldsHigh correlation
SEM_NOT is highly correlated with DTRATA and 1 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 13 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 5 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with PMM and 6 other fieldsHigh correlation
ID_MUNICIP is highly correlated with RESULT and 5 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with PMM and 8 other fieldsHigh correlation
NU_IDADE_N is highly correlated with CS_ESCOL_NHigh correlation
COMUNINF is highly correlated with COUFINF and 13 other fieldsHigh correlation
CLASSI_FIN is highly correlated with RESULT and 7 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 15 other fieldsHigh correlation
COPAISINF is highly correlated with RESULT and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with COUFINF and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 8 other fieldsHigh correlation
CS_GESTANT is highly correlated with COMUNINF and 3 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with RESULT and 9 other fieldsHigh correlation
SEM_PRI is highly correlated with SEM_NOT and 1 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 5 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with PMM and 9 other fieldsHigh correlation
ID_MN_RESI is highly correlated with DTRATA and 7 other fieldsHigh correlation
COUFINF is highly correlated with DTRATA and 7 other fieldsHigh correlation
ID_REGIONA is highly correlated with ID_OCUPA_N and 8 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 13 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 7 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
LOC_INF is highly correlated with DTRATA and 7 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
RESULT is highly correlated with ID_REGIONA and 14 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_PAIS and 8 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 8 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_PAIS and 8 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 9 other fieldsHigh correlation
TPAUTOCTO is highly correlated with DTRATA and 10 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with DSTRAESQUE and 8 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 9 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_PAIS and 7 other fieldsHigh correlation
CLASSI_FIN is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 9 other fieldsHigh correlation
DT_NASC has 4 (2.0%) missing values Missing
DT_INVEST has 204 (100.0%) missing values Missing
PMM has 119 (58.3%) missing values Missing
DT_ENCERRA has 204 (100.0%) missing values Missing
DEXAME is uniformly distributed Uniform
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 123 (60.3%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:40:17.729913
Analysis finished2021-07-06 18:40:38.912083
Duration21.18 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
2
204 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters204
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2204
100.0%

Most occurring characters

ValueCountFrequency (%)
2204
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number204
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2204
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common204
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2204
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII204
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2204
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
B54
204 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters612
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54204
100.0%

Most occurring characters

ValueCountFrequency (%)
B204
33.3%
5204
33.3%
4204
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
66.7%
Uppercase Letter204
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5204
50.0%
4204
50.0%
Uppercase Letter
ValueCountFrequency (%)
B204
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common408
66.7%
Latin204
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5204
50.0%
4204
50.0%
Latin
ValueCountFrequency (%)
B204
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII612
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B204
33.3%
5204
33.3%
4204
33.3%
Distinct136
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
Minimum2011-01-02 00:00:00
Maximum2011-12-29 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct50
Distinct (%)24.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201127.8039
Minimum201101
Maximum201152
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum201101
5-th percentile201102
Q1201112.75
median201127.5
Q3201142.25
95-th percentile201151
Maximum201152
Range51
Interquartile range (IQR)29.5

Descriptive statistics

Standard deviation16.16542513
Coefficient of variation (CV)8.03738957 × 10-5
Kurtosis-1.255137843
Mean201127.8039
Median Absolute Deviation (MAD)15
Skewness-0.1226061946
Sum41030072
Variance261.3209698
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20110711
 
5.4%
20115210
 
4.9%
20114910
 
4.9%
2011279
 
4.4%
2011457
 
3.4%
2011027
 
3.4%
2011227
 
3.4%
2011017
 
3.4%
2011336
 
2.9%
2011066
 
2.9%
Other values (40)124
60.8%
ValueCountFrequency (%)
2011017
3.4%
2011027
3.4%
2011033
 
1.5%
2011041
 
0.5%
2011052
 
1.0%
2011066
2.9%
20110711
5.4%
2011082
 
1.0%
2011093
 
1.5%
2011101
 
0.5%
ValueCountFrequency (%)
20115210
4.9%
2011515
2.5%
2011502
 
1.0%
20114910
4.9%
2011483
 
1.5%
2011475
2.5%
2011461
 
0.5%
2011457
3.4%
2011445
2.5%
2011433
 
1.5%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size12.3 KiB
2011
204 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters816
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2011
2nd row2011
3rd row2011
4th row2011
5th row2011

Common Values

ValueCountFrequency (%)
2011204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2011204
100.0%

Most occurring characters

ValueCountFrequency (%)
1408
50.0%
2204
25.0%
0204
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number816
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1408
50.0%
2204
25.0%
0204
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common816
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1408
50.0%
2204
25.0%
0204
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1408
50.0%
2204
25.0%
0204
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
33
200 
31
 
2
35
 
2

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters408
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33200
98.0%
312
 
1.0%
352
 
1.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33200
98.0%
352
 
1.0%
312
 
1.0%

Most occurring characters

ValueCountFrequency (%)
3404
99.0%
12
 
0.5%
52
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3404
99.0%
12
 
0.5%
52
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common408
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3404
99.0%
12
 
0.5%
52
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII408
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3404
99.0%
12
 
0.5%
52
 
0.5%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)6.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330477.7549
Minimum310620
Maximum355030
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum310620
5-th percentile330240
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum355030
Range44410
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3135.594401
Coefficient of variation (CV)0.009488064943
Kurtosis51.42519894
Mean330477.7549
Median Absolute Deviation (MAD)0
Skewness2.249393304
Sum67417462
Variance9831952.245
MonotonicityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
330455174
85.3%
33024011
 
5.4%
3303304
 
2.0%
3301003
 
1.5%
3550302
 
1.0%
3303902
 
1.0%
3106202
 
1.0%
3302001
 
0.5%
3304521
 
0.5%
3301701
 
0.5%
Other values (3)3
 
1.5%
ValueCountFrequency (%)
3106202
 
1.0%
3300801
 
0.5%
3301003
 
1.5%
3301701
 
0.5%
3302001
 
0.5%
33024011
5.4%
3303304
 
2.0%
3303902
 
1.0%
3304201
 
0.5%
3304521
 
0.5%
ValueCountFrequency (%)
3550302
 
1.0%
3306301
 
0.5%
330455174
85.3%
3304521
 
0.5%
3304201
 
0.5%
3303902
 
1.0%
3303304
 
2.0%
33024011
 
5.4%
3302001
 
0.5%
3301701
 
0.5%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
200 
1331
 
2
1449
 
2

Length

Max length4
Median length0
Mean length0.07843137255
Min length0

Characters and Unicode

Total characters16
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
200
98.0%
13312
 
1.0%
14492
 
1.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
13312
50.0%
14492
50.0%

Most occurring characters

ValueCountFrequency (%)
16
37.5%
44
25.0%
34
25.0%
92
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
16
37.5%
44
25.0%
34
25.0%
92
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common16
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
16
37.5%
44
25.0%
34
25.0%
92
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII16
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16
37.5%
44
25.0%
34
25.0%
92
 
12.5%

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct54
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2743472.593
Minimum63
Maximum6734014
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum63
5-th percentile2269554
Q12280183
median2288338
Q33005992
95-th percentile5158044
Maximum6734014
Range6733951
Interquartile range (IQR)725809

Descriptive statistics

Standard deviation1095365.625
Coefficient of variation (CV)0.3992624632
Kurtosis3.817876187
Mean2743472.593
Median Absolute Deviation (MAD)18533
Skewness1.269084729
Sum559668409
Variance1.199825853 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
228833856
27.5%
300345022
 
10.8%
226980517
 
8.3%
227653411
 
5.4%
30059928
 
3.9%
51580448
 
3.9%
33754717
 
3.4%
61766664
 
2.0%
30420814
 
2.0%
22801674
 
2.0%
Other values (44)63
30.9%
ValueCountFrequency (%)
633
 
1.5%
125051
 
0.5%
127771
 
0.5%
270492
 
1.0%
20288402
 
1.0%
22693251
 
0.5%
22695542
 
1.0%
226980517
8.3%
22699611
 
0.5%
22705441
 
0.5%
ValueCountFrequency (%)
67340141
 
0.5%
62873361
 
0.5%
62723201
 
0.5%
61766664
2.0%
54352342
 
1.0%
53711201
 
0.5%
51580448
3.9%
38103481
 
0.5%
37849591
 
0.5%
36035391
 
0.5%
Distinct149
Distinct (%)73.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
Minimum2010-02-15 00:00:00
Maximum2011-12-26 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct54
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201123.2696
Minimum201007
Maximum201152
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum201007
5-th percentile201059.35
Q1201109.75
median201127
Q3201140
95-th percentile201150
Maximum201152
Range145
Interquartile range (IQR)30.25

Descriptive statistics

Standard deviation24.34321216
Coefficient of variation (CV)0.0001210362789
Kurtosis4.552289094
Mean201123.2696
Median Absolute Deviation (MAD)15
Skewness-1.806640557
Sum41029147
Variance592.5919782
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20112710
 
4.9%
2011398
 
3.9%
2011498
 
3.9%
2011058
 
3.9%
2011067
 
3.4%
2010527
 
3.4%
2011477
 
3.4%
2011115
 
2.5%
2011305
 
2.5%
2011315
 
2.5%
Other values (44)134
65.7%
ValueCountFrequency (%)
2010071
 
0.5%
2010351
 
0.5%
2010391
 
0.5%
2010511
 
0.5%
2010527
3.4%
2011013
 
1.5%
2011024
2.0%
2011034
2.0%
2011043
 
1.5%
2011058
3.9%
ValueCountFrequency (%)
2011525
2.5%
2011514
2.0%
2011503
 
1.5%
2011498
3.9%
2011483
 
1.5%
2011477
3.4%
2011462
 
1.0%
2011455
2.5%
2011443
 
1.5%
2011433
 
1.5%

DT_NASC
Date

MISSING

Distinct188
Distinct (%)94.0%
Missing4
Missing (%)2.0%
Memory size1.7 KiB
Minimum1924-06-01 00:00:00
Maximum2011-11-02 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

HIGH CORRELATION

Distinct61
Distinct (%)29.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4008.098039
Minimum2022
Maximum4086
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum2022
5-th percentile4019
Q14027
median4034.5
Q34047.25
95-th percentile4066.85
Maximum4086
Range2064
Interquartile range (IQR)20.25

Descriptive statistics

Standard deviation200.8522242
Coefficient of variation (CV)0.05011160461
Kurtosis58.06920252
Mean4008.098039
Median Absolute Deviation (MAD)8.5
Skewness-7.279237194
Sum817652
Variance40341.61596
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
404910
 
4.9%
402710
 
4.9%
40269
 
4.4%
40299
 
4.4%
40309
 
4.4%
40317
 
3.4%
40327
 
3.4%
40357
 
3.4%
40406
 
2.9%
40336
 
2.9%
Other values (51)124
60.8%
ValueCountFrequency (%)
20221
0.5%
30011
0.5%
30031
0.5%
30081
0.5%
30101
0.5%
40031
0.5%
40111
0.5%
40151
0.5%
40161
0.5%
40171
0.5%
ValueCountFrequency (%)
40861
0.5%
40851
0.5%
40801
0.5%
40791
0.5%
40771
0.5%
40742
1.0%
40682
1.0%
40672
1.0%
40661
0.5%
40641
0.5%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
M
142 
F
62 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters204
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
M142
69.6%
F62
30.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m142
69.6%
f62
30.4%

Most occurring characters

ValueCountFrequency (%)
M142
69.6%
F62
30.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter204
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M142
69.6%
F62
30.4%

Most occurring scripts

ValueCountFrequency (%)
Latin204
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M142
69.6%
F62
30.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII204
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M142
69.6%
F62
30.4%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
6
149 
5
49 
9
 
5
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters204
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row6
2nd row6
3rd row6
4th row6
5th row5

Common Values

ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

Most occurring characters

ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number204
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common204
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII204
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6149
73.0%
549
 
24.0%
95
 
2.5%
31
 
0.5%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size13.2 KiB
1
131 
4
35 
2
19 
9
 
12
 
7

Length

Max length1
Median length1
Mean length0.9656862745
Min length0

Characters and Unicode

Total characters197
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1131
64.2%
435
 
17.2%
219
 
9.3%
912
 
5.9%
7
 
3.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1131
66.5%
435
 
17.8%
219
 
9.6%
912
 
6.1%

Most occurring characters

ValueCountFrequency (%)
1131
66.5%
435
 
17.8%
219
 
9.6%
912
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number197
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1131
66.5%
435
 
17.8%
219
 
9.6%
912
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
Common197
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1131
66.5%
435
 
17.8%
219
 
9.6%
912
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII197
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1131
66.5%
435
 
17.8%
219
 
9.6%
912
 
6.1%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size14.1 KiB
08
82 
06
33 
09
28 
05
14 
14 
Other values (5)
33 

Length

Max length2
Median length2
Mean length1.862745098
Min length0

Characters and Unicode

Total characters380
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row08
2nd row
3rd row09
4th row06
5th row06

Common Values

ValueCountFrequency (%)
0882
40.2%
0633
16.2%
0928
 
13.7%
0514
 
6.9%
14
 
6.9%
0410
 
4.9%
078
 
3.9%
106
 
2.9%
036
 
2.9%
023
 
1.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0882
43.2%
0633
17.4%
0928
 
14.7%
0514
 
7.4%
0410
 
5.3%
078
 
4.2%
106
 
3.2%
036
 
3.2%
023
 
1.6%

Most occurring characters

ValueCountFrequency (%)
0190
50.0%
882
21.6%
633
 
8.7%
928
 
7.4%
514
 
3.7%
410
 
2.6%
78
 
2.1%
16
 
1.6%
36
 
1.6%
23
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number380
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0190
50.0%
882
21.6%
633
 
8.7%
928
 
7.4%
514
 
3.7%
410
 
2.6%
78
 
2.1%
16
 
1.6%
36
 
1.6%
23
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common380
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0190
50.0%
882
21.6%
633
 
8.7%
928
 
7.4%
514
 
3.7%
410
 
2.6%
78
 
2.1%
16
 
1.6%
36
 
1.6%
23
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0190
50.0%
882
21.6%
633
 
8.7%
928
 
7.4%
514
 
3.7%
410
 
2.6%
78
 
2.1%
16
 
1.6%
36
 
1.6%
23
 
0.8%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size11.9 KiB
33
204 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters408
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33204
100.0%

Most occurring characters

ValueCountFrequency (%)
3408
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number408
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3408
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common408
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3408
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII408
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3408
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330400.402
Minimum330020
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum330020
5-th percentile330170
Q1330350
median330455
Q3330455
95-th percentile330455
Maximum330630
Range610
Interquartile range (IQR)105

Descriptive statistics

Standard deviation106.5357092
Coefficient of variation (CV)0.0003224442482
Kurtosis1.70512483
Mean330400.402
Median Absolute Deviation (MAD)0
Skewness-1.621978939
Sum67401682
Variance11349.85734
MonotonicityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
330455142
69.6%
33024011
 
5.4%
33017011
 
5.4%
3303309
 
4.4%
3303506
 
2.9%
3303204
 
2.0%
3304903
 
1.5%
3301003
 
1.5%
3303902
 
1.0%
3300802
 
1.0%
Other values (10)11
 
5.4%
ValueCountFrequency (%)
3300201
 
0.5%
3300802
 
1.0%
3301003
 
1.5%
3301401
 
0.5%
3301501
 
0.5%
33017011
5.4%
3302001
 
0.5%
33024011
5.4%
3302601
 
0.5%
3302701
 
0.5%
ValueCountFrequency (%)
3306301
 
0.5%
3305401
 
0.5%
3304903
 
1.5%
330455142
69.6%
3304521
 
0.5%
3304202
 
1.0%
3303902
 
1.0%
3303506
 
2.9%
3303309
 
4.4%
3303204
 
2.0%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size12.3 KiB
204 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
1
204 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters204
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1204
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1204
100.0%

Most occurring characters

ValueCountFrequency (%)
1204
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number204
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1204
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common204
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1204
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII204
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1204
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing204
Missing (%)100.0%
Memory size1.7 KiB

ID_OCUPA_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct38
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Memory size12.4 KiB
156 
999991
 
8
252105
 
3
222105
 
2
241040
 
2
Other values (33)
33 

Length

Max length6
Median length0
Mean length1.411764706
Min length0

Characters and Unicode

Total characters288
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)16.2%

Sample

1st row215305
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
156
76.5%
9999918
 
3.9%
2521053
 
1.5%
2221052
 
1.0%
2410402
 
1.0%
2525251
 
0.5%
3421201
 
0.5%
7252051
 
0.5%
7827051
 
0.5%
9999931
 
0.5%
Other values (28)28
 
13.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
9999918
 
16.7%
2521053
 
6.2%
2221052
 
4.2%
2410402
 
4.2%
2134151
 
2.1%
2525251
 
2.1%
3421201
 
2.1%
7252051
 
2.1%
7827051
 
2.1%
9999931
 
2.1%
Other values (27)27
56.2%

Most occurring characters

ValueCountFrequency (%)
153
18.4%
952
18.1%
550
17.4%
245
15.6%
039
13.5%
420
 
6.9%
312
 
4.2%
712
 
4.2%
83
 
1.0%
62
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number288
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
153
18.4%
952
18.1%
550
17.4%
245
15.6%
039
13.5%
420
 
6.9%
312
 
4.2%
712
 
4.2%
83
 
1.0%
62
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common288
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
153
18.4%
952
18.1%
550
17.4%
245
15.6%
039
13.5%
420
 
6.9%
312
 
4.2%
712
 
4.2%
83
 
1.0%
62
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII288
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153
18.4%
952
18.1%
550
17.4%
245
15.6%
039
13.5%
420
 
6.9%
312
 
4.2%
712
 
4.2%
83
 
1.0%
62
 
0.7%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
2
98 
1
96 
8
 
9
 
1

Length

Max length1
Median length1
Mean length0.9950980392
Min length0

Characters and Unicode

Total characters203
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row1
2nd row8
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
298
48.0%
196
47.1%
89
 
4.4%
1
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
298
48.3%
196
47.3%
89
 
4.4%

Most occurring characters

ValueCountFrequency (%)
298
48.3%
196
47.3%
89
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number203
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
298
48.3%
196
47.3%
89
 
4.4%

Most occurring scripts

ValueCountFrequency (%)
Common203
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
298
48.3%
196
47.3%
89
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII203
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
298
48.3%
196
47.3%
89
 
4.4%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size14.1 KiB
11
134 
10
21 
99
19 
9
 
7
8
 
6
Other values (5)
17 

Length

Max length2
Median length2
Mean length1.833333333
Min length0

Characters and Unicode

Total characters374
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11
2nd row99
3rd row11
4th row11
5th row11

Common Values

ValueCountFrequency (%)
11134
65.7%
1021
 
10.3%
9919
 
9.3%
97
 
3.4%
86
 
2.9%
45
 
2.5%
4
 
2.0%
13
 
1.5%
63
 
1.5%
32
 
1.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
11134
67.0%
1021
 
10.5%
9919
 
9.5%
97
 
3.5%
86
 
3.0%
45
 
2.5%
13
 
1.5%
63
 
1.5%
32
 
1.0%

Most occurring characters

ValueCountFrequency (%)
1292
78.1%
945
 
12.0%
021
 
5.6%
86
 
1.6%
45
 
1.3%
63
 
0.8%
32
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number374
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1292
78.1%
945
 
12.0%
021
 
5.6%
86
 
1.6%
45
 
1.3%
63
 
0.8%
32
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common374
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1292
78.1%
945
 
12.0%
021
 
5.6%
86
 
1.6%
45
 
1.3%
63
 
0.8%
32
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII374
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1292
78.1%
945
 
12.0%
021
 
5.6%
86
 
1.6%
45
 
1.3%
63
 
0.8%
32
 
0.5%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
2
165 
1
34 
 
4
3
 
1

Length

Max length1
Median length1
Mean length0.9803921569
Min length0

Characters and Unicode

Total characters200
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2165
80.9%
134
 
16.7%
4
 
2.0%
31
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2165
82.5%
134
 
17.0%
31
 
0.5%

Most occurring characters

ValueCountFrequency (%)
2165
82.5%
134
 
17.0%
31
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number200
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2165
82.5%
134
 
17.0%
31
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common200
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2165
82.5%
134
 
17.0%
31
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2165
82.5%
134
 
17.0%
31
 
0.5%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
1
196 
2
 
4
 
4

Length

Max length1
Median length1
Mean length0.9803921569
Min length0

Characters and Unicode

Total characters200
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1196
96.1%
24
 
2.0%
4
 
2.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1196
98.0%
24
 
2.0%

Most occurring characters

ValueCountFrequency (%)
1196
98.0%
24
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number200
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1196
98.0%
24
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Common200
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1196
98.0%
24
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1196
98.0%
24
 
2.0%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size12.8 KiB
107 
2
82 
3
15 

Length

Max length1
Median length0
Mean length0.4754901961
Min length0

Characters and Unicode

Total characters97
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row
3rd row2
4th row
5th row

Common Values

ValueCountFrequency (%)
107
52.5%
282
40.2%
315
 
7.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
282
84.5%
315
 
15.5%

Most occurring characters

ValueCountFrequency (%)
282
84.5%
315
 
15.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number97
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
282
84.5%
315
 
15.5%

Most occurring scripts

ValueCountFrequency (%)
Common97
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
282
84.5%
315
 
15.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII97
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
282
84.5%
315
 
15.5%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size12.6 KiB
167 
AM
 
14
RO
 
10
PA
 
5
MA
 
3
Other values (3)
 
5

Length

Max length2
Median length0
Mean length0.362745098
Min length0

Characters and Unicode

Total characters74
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
167
81.9%
AM14
 
6.9%
RO10
 
4.9%
PA5
 
2.5%
MA3
 
1.5%
AC2
 
1.0%
AP2
 
1.0%
MS1
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
am14
37.8%
ro10
27.0%
pa5
 
13.5%
ma3
 
8.1%
ac2
 
5.4%
ap2
 
5.4%
ms1
 
2.7%

Most occurring characters

ValueCountFrequency (%)
A26
35.1%
M18
24.3%
R10
 
13.5%
O10
 
13.5%
P7
 
9.5%
C2
 
2.7%
S1
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter74
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A26
35.1%
M18
24.3%
R10
 
13.5%
O10
 
13.5%
P7
 
9.5%
C2
 
2.7%
S1
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Latin74
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A26
35.1%
M18
24.3%
R10
 
13.5%
O10
 
13.5%
P7
 
9.5%
C2
 
2.7%
S1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII74
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A26
35.1%
M18
24.3%
R10
 
13.5%
O10
 
13.5%
P7
 
9.5%
C2
 
2.7%
S1
 
1.4%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.21568627
Minimum0
Maximum177
Zeros123
Zeros (%)60.3%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile110.7
Maximum177
Range177
Interquartile range (IQR)1

Descriptive statistics

Standard deviation33.60246515
Coefficient of variation (CV)2.542619766
Kurtosis10.22621218
Mean13.21568627
Median Absolute Deviation (MAD)0
Skewness3.243308906
Sum2696
Variance1129.125664
MonotonicityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0123
60.3%
137
 
18.1%
3128
 
13.7%
1133
 
1.5%
223
 
1.5%
1402
 
1.0%
1771
 
0.5%
1641
 
0.5%
1571
 
0.5%
1531
 
0.5%
Other values (4)4
 
2.0%
ValueCountFrequency (%)
0123
60.3%
137
 
18.1%
223
 
1.5%
3128
 
13.7%
971
 
0.5%
1091
 
0.5%
1111
 
0.5%
1133
 
1.5%
1381
 
0.5%
1402
 
1.0%
ValueCountFrequency (%)
1771
 
0.5%
1641
 
0.5%
1571
 
0.5%
1531
 
0.5%
1402
1.0%
1381
 
0.5%
1133
1.5%
1111
 
0.5%
1091
 
0.5%
971
 
0.5%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct22
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size12.3 KiB
167 
130260
 
9
110020
 
7
150140
 
2
120040
 
2
Other values (17)
17 

Length

Max length6
Median length0
Mean length1.088235294
Min length0

Characters and Unicode

Total characters222
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)8.3%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
167
81.9%
1302609
 
4.4%
1100207
 
3.4%
1501402
 
1.0%
1200402
 
1.0%
2111301
 
0.5%
2112501
 
0.5%
1100121
 
0.5%
1303561
 
0.5%
1300081
 
0.5%
Other values (12)12
 
5.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1302609
24.3%
1100207
18.9%
1200402
 
5.4%
1501402
 
5.4%
5005201
 
2.7%
2112501
 
2.7%
1100121
 
2.7%
1303561
 
2.7%
1300081
 
2.7%
1304201
 
2.7%
Other values (11)11
29.7%

Most occurring characters

ValueCountFrequency (%)
087
39.2%
155
24.8%
228
 
12.6%
319
 
8.6%
613
 
5.9%
59
 
4.1%
46
 
2.7%
83
 
1.4%
72
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number222
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
087
39.2%
155
24.8%
228
 
12.6%
319
 
8.6%
613
 
5.9%
59
 
4.1%
46
 
2.7%
83
 
1.4%
72
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common222
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
087
39.2%
155
24.8%
228
 
12.6%
319
 
8.6%
613
 
5.9%
59
 
4.1%
46
 
2.7%
83
 
1.4%
72
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
087
39.2%
155
24.8%
228
 
12.6%
319
 
8.6%
613
 
5.9%
59
 
4.1%
46
 
2.7%
83
 
1.4%
72
 
0.9%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct27
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Memory size12.3 KiB
162 
LUAN
 
11
MANA
 
5
MOCA
 
2
PEDR
 
2
Other values (22)
22 

Length

Max length4
Median length0
Mean length0.8137254902
Min length0

Characters and Unicode

Total characters166
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)10.8%

Sample

1st rowLUAN
2nd row
3rd rowLAS
4th row
5th row

Common Values

ValueCountFrequency (%)
162
79.4%
LUAN11
 
5.4%
MANA5
 
2.5%
MOCA2
 
1.0%
PEDR2
 
1.0%
SINT1
 
0.5%
GANA1
 
0.5%
LAS1
 
0.5%
PARA1
 
0.5%
CATA1
 
0.5%
Other values (17)17
 
8.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
luan11
26.2%
mana5
 
11.9%
pedr2
 
4.8%
moca2
 
4.8%
nige1
 
2.4%
alto1
 
2.4%
las1
 
2.4%
brev1
 
2.4%
ariq1
 
2.4%
cata1
 
2.4%
Other values (16)16
38.1%

Most occurring characters

ValueCountFrequency (%)
A38
22.9%
N23
13.9%
L16
9.6%
U14
 
8.4%
E11
 
6.6%
R9
 
5.4%
M9
 
5.4%
T8
 
4.8%
I7
 
4.2%
O6
 
3.6%
Other values (9)25
15.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter166
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A38
22.9%
N23
13.9%
L16
9.6%
U14
 
8.4%
E11
 
6.6%
R9
 
5.4%
M9
 
5.4%
T8
 
4.8%
I7
 
4.2%
O6
 
3.6%
Other values (9)25
15.1%

Most occurring scripts

ValueCountFrequency (%)
Latin166
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A38
22.9%
N23
13.9%
L16
9.6%
U14
 
8.4%
E11
 
6.6%
R9
 
5.4%
M9
 
5.4%
T8
 
4.8%
I7
 
4.2%
O6
 
3.6%
Other values (9)25
15.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A38
22.9%
N23
13.9%
L16
9.6%
U14
 
8.4%
E11
 
6.6%
R9
 
5.4%
M9
 
5.4%
T8
 
4.8%
I7
 
4.2%
O6
 
3.6%
Other values (9)25
15.1%

DEXAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct130
Distinct (%)63.7%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
2011-07-06
 
7
None
 
5
2011-08-15
 
4
2011-10-03
 
3
2011-11-04
 
3
Other values (125)
182 

Length

Max length10
Median length10
Mean length9.852941176
Min length4

Characters and Unicode

Total characters2010
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique80 ?
Unique (%)39.2%

Sample

1st row2011-01-02
2nd row2011-01-03
3rd row2011-01-04
4th row2011-01-06
5th row2011-01-06

Common Values

ValueCountFrequency (%)
2011-07-067
 
3.4%
None5
 
2.5%
2011-08-154
 
2.0%
2011-10-033
 
1.5%
2011-11-043
 
1.5%
2011-05-253
 
1.5%
2011-01-133
 
1.5%
2011-12-263
 
1.5%
2011-12-053
 
1.5%
2011-02-153
 
1.5%
Other values (120)167
81.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2011-07-067
 
3.4%
none5
 
2.5%
2011-08-154
 
2.0%
2011-12-273
 
1.5%
2011-05-253
 
1.5%
2011-11-043
 
1.5%
2011-01-133
 
1.5%
2011-12-263
 
1.5%
2011-02-073
 
1.5%
2011-12-053
 
1.5%
Other values (120)167
81.9%

Most occurring characters

ValueCountFrequency (%)
1577
28.7%
0435
21.6%
-398
19.8%
2333
16.6%
644
 
2.2%
542
 
2.1%
335
 
1.7%
735
 
1.7%
431
 
1.5%
831
 
1.5%
Other values (5)49
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1592
79.2%
Dash Punctuation398
 
19.8%
Lowercase Letter15
 
0.7%
Uppercase Letter5
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1577
36.2%
0435
27.3%
2333
20.9%
644
 
2.8%
542
 
2.6%
335
 
2.2%
735
 
2.2%
431
 
1.9%
831
 
1.9%
929
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
o5
33.3%
n5
33.3%
e5
33.3%
Dash Punctuation
ValueCountFrequency (%)
-398
100.0%
Uppercase Letter
ValueCountFrequency (%)
N5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1990
99.0%
Latin20
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1577
29.0%
0435
21.9%
-398
20.0%
2333
16.7%
644
 
2.2%
542
 
2.1%
335
 
1.8%
735
 
1.8%
431
 
1.6%
831
 
1.6%
Latin
ValueCountFrequency (%)
N5
25.0%
o5
25.0%
n5
25.0%
e5
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2010
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1577
28.7%
0435
21.6%
-398
19.8%
2333
16.6%
644
 
2.2%
542
 
2.1%
335
 
1.7%
735
 
1.7%
431
 
1.5%
831
 
1.5%
Other values (5)49
 
2.4%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size13.3 KiB
1
99 
2
46 
4
45 
 
4
10
 
3
Other values (4)
 
7

Length

Max length2
Median length1
Mean length0.9950980392
Min length0

Characters and Unicode

Total characters203
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.0%

Sample

1st row2
2nd row4
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
199
48.5%
246
22.5%
445
22.1%
4
 
2.0%
103
 
1.5%
53
 
1.5%
72
 
1.0%
61
 
0.5%
31
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
199
49.5%
246
23.0%
445
22.5%
103
 
1.5%
53
 
1.5%
72
 
1.0%
61
 
0.5%
31
 
0.5%

Most occurring characters

ValueCountFrequency (%)
1102
50.2%
246
22.7%
445
22.2%
03
 
1.5%
53
 
1.5%
72
 
1.0%
31
 
0.5%
61
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number203
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1102
50.2%
246
22.7%
445
22.2%
03
 
1.5%
53
 
1.5%
72
 
1.0%
31
 
0.5%
61
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common203
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1102
50.2%
246
22.7%
445
22.2%
03
 
1.5%
53
 
1.5%
72
 
1.0%
31
 
0.5%
61
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII203
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1102
50.2%
246
22.7%
445
22.2%
03
 
1.5%
53
 
1.5%
72
 
1.0%
31
 
0.5%
61
 
0.5%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct61
Distinct (%)71.8%
Missing119
Missing (%)58.3%
Infinite0
Infinite (%)0.0%
Mean5131.141176
Minimum3
Maximum200000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum3
5-th percentile258
Q1357
median480
Q32800
95-th percentile12288
Maximum200000
Range199997
Interquartile range (IQR)2443

Descriptive statistics

Standard deviation22043.98483
Coefficient of variation (CV)4.296117389
Kurtosis75.02802548
Mean5131.141176
Median Absolute Deviation (MAD)174
Skewness8.455287793
Sum436147
Variance485937267.4
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3808
 
3.9%
4203
 
1.5%
7803
 
1.5%
4803
 
1.5%
3403
 
1.5%
4903
 
1.5%
88902
 
1.0%
3152
 
1.0%
3502
 
1.0%
4022
 
1.0%
Other values (51)54
26.5%
(Missing)119
58.3%
ValueCountFrequency (%)
31
0.5%
2001
0.5%
2081
0.5%
2101
0.5%
2531
0.5%
2781
0.5%
2801
0.5%
3001
0.5%
3011
0.5%
3061
0.5%
ValueCountFrequency (%)
2000001
0.5%
361601
0.5%
201201
0.5%
148001
0.5%
124001
0.5%
118401
0.5%
108002
1.0%
98501
0.5%
88902
1.0%
68801
0.5%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size12.8 KiB
103 
3
44 
4
35 
2
11 
5
 
7
Other values (2)
 
4

Length

Max length1
Median length0
Mean length0.4950980392
Min length0

Characters and Unicode

Total characters101
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row5
2nd row4
3rd row3
4th row
5th row

Common Values

ValueCountFrequency (%)
103
50.5%
344
21.6%
435
 
17.2%
211
 
5.4%
57
 
3.4%
13
 
1.5%
61
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
344
43.6%
435
34.7%
211
 
10.9%
57
 
6.9%
13
 
3.0%
61
 
1.0%

Most occurring characters

ValueCountFrequency (%)
344
43.6%
435
34.7%
211
 
10.9%
57
 
6.9%
13
 
3.0%
61
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number101
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
344
43.6%
435
34.7%
211
 
10.9%
57
 
6.9%
13
 
3.0%
61
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common101
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
344
43.6%
435
34.7%
211
 
10.9%
57
 
6.9%
13
 
3.0%
61
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII101
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
344
43.6%
435
34.7%
211
 
10.9%
57
 
6.9%
13
 
3.0%
61
 
1.0%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size13.0 KiB
105 
99
53 
1
42 
3
 
2
11
 
1

Length

Max length2
Median length0
Mean length0.7549019608
Min length0

Characters and Unicode

Total characters154
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.0%

Sample

1st row11
2nd row
3rd row99
4th row
5th row

Common Values

ValueCountFrequency (%)
105
51.5%
9953
26.0%
142
 
20.6%
32
 
1.0%
111
 
0.5%
101
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
9953
53.5%
142
42.4%
32
 
2.0%
111
 
1.0%
101
 
1.0%

Most occurring characters

ValueCountFrequency (%)
9106
68.8%
145
29.2%
32
 
1.3%
01
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number154
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9106
68.8%
145
29.2%
32
 
1.3%
01
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common154
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9106
68.8%
145
29.2%
32
 
1.3%
01
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII154
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9106
68.8%
145
29.2%
32
 
1.3%
01
 
0.6%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct26
Distinct (%)12.7%
Missing0
Missing (%)0.0%
Memory size13.2 KiB
152 
ARTESUNATO + MEFLOQUINA
16 
ARTESUNATO+MEFLOQUINA
 
6
ARTESUNATO INJETAVEL
 
4
ARTESUNATO +MEFLOQUINA
 
4
Other values (21)
22 

Length

Max length30
Median length0
Mean length5.705882353
Min length0

Characters and Unicode

Total characters1164
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)9.8%

Sample

1st row
2nd row
3rd rowARTESUNATO + MEFLOQUINA
4th row
5th row

Common Values

ValueCountFrequency (%)
152
74.5%
ARTESUNATO + MEFLOQUINA16
 
7.8%
ARTESUNATO+MEFLOQUINA6
 
2.9%
ARTESUNATO INJETAVEL4
 
2.0%
ARTESUNATO +MEFLOQUINA4
 
2.0%
ARTESUNATO+ MEFLOQUINA2
 
1.0%
ARTEZUNATO + NEFLOQUINA1
 
0.5%
10 COMP CLOROQUINA 28 PRIMOQUI1
 
0.5%
10 COMPRIMIDOS CLOROQUINA1
 
0.5%
ARTESUNATO+CILINDAMICINA1
 
0.5%
Other values (16)16
 
7.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
artesunato29
22.5%
mefloquina27
20.9%
26
20.2%
artesunato+mefloquina6
 
4.7%
injetavel4
 
3.1%
artesanato3
 
2.3%
primaquina2
 
1.6%
102
 
1.6%
cloroquina2
 
1.6%
282
 
1.6%
Other values (26)26
20.2%

Most occurring characters

ValueCountFrequency (%)
A148
12.7%
N100
 
8.6%
E97
 
8.3%
T96
 
8.2%
O96
 
8.2%
U90
 
7.7%
77
 
6.6%
I68
 
5.8%
R59
 
5.1%
M51
 
4.4%
Other values (18)282
24.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1019
87.5%
Space Separator77
 
6.6%
Math Symbol43
 
3.7%
Decimal Number25
 
2.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A148
14.5%
N100
9.8%
E97
9.5%
T96
9.4%
O96
9.4%
U90
8.8%
I68
6.7%
R59
 
5.8%
M51
 
5.0%
Q48
 
4.7%
Other values (10)166
16.3%
Decimal Number
ValueCountFrequency (%)
010
40.0%
16
24.0%
25
20.0%
82
 
8.0%
31
 
4.0%
51
 
4.0%
Space Separator
ValueCountFrequency (%)
77
100.0%
Math Symbol
ValueCountFrequency (%)
+43
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1019
87.5%
Common145
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
A148
14.5%
N100
9.8%
E97
9.5%
T96
9.4%
O96
9.4%
U90
8.8%
I68
6.7%
R59
 
5.8%
M51
 
5.0%
Q48
 
4.7%
Other values (10)166
16.3%
Common
ValueCountFrequency (%)
77
53.1%
+43
29.7%
010
 
6.9%
16
 
4.1%
25
 
3.4%
82
 
1.4%
31
 
0.7%
51
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A148
12.7%
N100
 
8.6%
E97
 
8.3%
T96
 
8.2%
O96
 
8.2%
U90
 
7.7%
77
 
6.6%
I68
 
5.8%
R59
 
5.1%
M51
 
4.4%
Other values (18)282
24.2%

DTRATA
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct82
Distinct (%)40.2%
Missing0
Missing (%)0.0%
Memory size12.9 KiB
None
104 
2011-08-15
 
3
2011-11-29
 
2
2011-10-05
 
2
2011-11-01
 
2
Other values (77)
91 

Length

Max length10
Median length4
Mean length6.941176471
Min length4

Characters and Unicode

Total characters1416
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)30.9%

Sample

1st row2011-01-02
2nd row2011-01-04
3rd row2011-01-04
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None104
51.0%
2011-08-153
 
1.5%
2011-11-292
 
1.0%
2011-10-052
 
1.0%
2011-11-012
 
1.0%
2011-01-132
 
1.0%
2011-09-142
 
1.0%
2011-04-042
 
1.0%
2011-06-032
 
1.0%
2011-02-142
 
1.0%
Other values (72)81
39.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none104
51.0%
2011-08-153
 
1.5%
2011-11-292
 
1.0%
2011-10-052
 
1.0%
2011-12-292
 
1.0%
2011-11-012
 
1.0%
2011-01-132
 
1.0%
2011-04-042
 
1.0%
2011-06-032
 
1.0%
2011-11-112
 
1.0%
Other values (72)81
39.7%

Most occurring characters

ValueCountFrequency (%)
1300
21.2%
0210
14.8%
-200
14.1%
2165
11.7%
N104
 
7.3%
o104
 
7.3%
n104
 
7.3%
e104
 
7.3%
526
 
1.8%
423
 
1.6%
Other values (5)76
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number800
56.5%
Lowercase Letter312
 
22.0%
Dash Punctuation200
 
14.1%
Uppercase Letter104
 
7.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1300
37.5%
0210
26.2%
2165
20.6%
526
 
3.2%
423
 
2.9%
821
 
2.6%
616
 
2.0%
316
 
2.0%
914
 
1.8%
79
 
1.1%
Lowercase Letter
ValueCountFrequency (%)
o104
33.3%
n104
33.3%
e104
33.3%
Dash Punctuation
ValueCountFrequency (%)
-200
100.0%
Uppercase Letter
ValueCountFrequency (%)
N104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1000
70.6%
Latin416
29.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1300
30.0%
0210
21.0%
-200
20.0%
2165
16.5%
526
 
2.6%
423
 
2.3%
821
 
2.1%
616
 
1.6%
316
 
1.6%
914
 
1.4%
Latin
ValueCountFrequency (%)
N104
25.0%
o104
25.0%
n104
25.0%
e104
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1416
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1300
21.2%
0210
14.8%
-200
14.1%
2165
11.7%
N104
 
7.3%
o104
 
7.3%
n104
 
7.3%
e104
 
7.3%
526
 
1.8%
423
 
1.6%
Other values (5)76
 
5.4%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing204
Missing (%)100.0%
Memory size1.7 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542011-01-0220110120113333045530059922010-12-262010521951-05-034059M6108333304551NaT21530511121231LUAN2011-01-02212400.05112011-01-02NaT
12B542011-01-0320110120113333024022765342011-01-012010521995-02-274015M64333302401NaT8992102011-01-034NaN42011-01-04NaT
22B542011-01-0420110120113333045533754712010-12-282010521974-06-144036M6109333304551NaT111212138LAS2011-01-042420.0399ARTESUNATO + MEFLOQUINA2011-01-04NaT
32B542011-01-0620110120113333045522801832011-01-052011011990-11-114020M6206333304551NaT2112102011-01-061NaNNoneNaT
42B542011-01-0620110120113333045522801832011-01-052011011961-04-024049F5206333304551NaT2112102011-01-061NaNNoneNaT
52B542011-01-0620110120113333045522801832010-12-232010512010-02-073010M6210333304551NaT110212312011-01-062NaN399ARTESUNATO + MEFLOQUINA POR 3D2011-01-06NaT
62B542011-01-0720110120113333045522698052010-12-312010521943-10-084067F5108333304551NaT2112102011-01-071NaNNoneNaT
72B542011-01-1020110220113333045522953502011-01-092011021958-09-264052M6108333304551NaT2112102011-01-101NaNNoneNaT
82B542011-01-1120110220113333045522883382010-08-302010351987-04-274023M6105333304551NaT2112102011-01-111NaNNoneNaT
92B542011-01-1220110220113333045530130142011-01-112011021954-11-114056M6909333304551NaT19921231LUAN2011-01-122NaN399ARTESUNATO+MEFLOQUINA2011-01-12NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
1942B542011-12-2520115220113333045530034502011-12-232011511950-10-184061M6103333301501NaT34212011121231LUAN2011-12-252380.0399ARTESUNATO+MEFLOQUINA2011-12-25NaT
1952B542011-12-2620115220113333045527083532011-12-252011521985-11-024026M6405333304551NaT725005111212RO1110040ALTO2011-12-26420120.0512011-12-26NaT
1962B542011-12-2620115220113333045530034502011-12-252011521959-06-194052M6105333304551NaT6220102112102011-12-261NaNNoneNaT
1972B542011-12-2620115220113333045530034502011-12-232011511982-07-304029M6405333304551NaT7155452112102011-12-261NaNNoneNaT
1982B542011-12-2720115220113333045533754712011-12-262011521979-09-294032F5208333302401NaT2410052112102011-12-271NaNNoneNaT
1992B542011-12-2720115220113333045530034502011-12-262011521947-07-014064M6108333304551NaT22210511121231LUAN2011-12-272380.0399ARTESUNATO+ MEFLOQUINA2011-12-27NaT
2002B542011-12-2720115220113333045530059922011-12-212011511952-06-074059M6108333300201NaT2142152112102011-12-271NaNNoneNaT
2012B542011-12-28201152201135355030133120288402011-12-102011491987-09-124024M6108333304551NaT251305199112AM11300002011-12-286NaN212011-12-28NaT
2022B542011-12-2920115220113333045530034502011-12-262011521974-08-014037M6108333304551NaT25252511121297CATA2011-12-292278.0299ARTESUNATO+ MEFLOQUINA2011-12-29NaT
2032B542011-12-2920115220113333045536035392011-11-282011481963-07-284048M6108333304551NaT252105111212140BANG2011-12-2910380.0312011-12-29NaT